Feature/bq incremental strategy insert_overwrite #2153

jtcohen6 · 2020-02-24T22:41:10Z

This is a small feature that builds on top of the tremendous work from #2140. It shouldn't have any breaking changes, so I think we could ship it in 0.16.1. I'm opening this now so that I can link to it in a forthcoming post about dbt + BigQuery + incremental models.

A common request from the community is an incremental materialization on BigQuery to just drop and replace an entire day of data. By setting incremental_strategy = "insert_overwrite" in the config, any partition with new data will be completely dropped and recreated.

Example usage:

{{ config(
    materialized='incremental',
    partition_by="ts",
    cluster_by="id",
    incremental_strategy = "insert_overwrite"
) }}

with data as (
    select 1 as id, cast('2019-01-01' as date) as ts union all
    select 2 as id, cast('2019-01-02' as date) as ts union all
    select 3 as id, cast('2019-01-02' as date) as ts union all
    select 4 as id, cast('2019-01-03' as date) as ts union all
    select 5 as id, cast('2019-01-04' as date)
)

select *
from data

{% if is_incremental() %}
where ts > _dbt_max_partition
{% endif %}

drewbanin

@jtcohen6 this is great! And self-contained! I think I just convinced myself that we should try to ship this for 0.16.0.... i can't think of any reason at all why we should not do that. Can you?

drewbanin · 2020-03-03T03:23:39Z

core/dbt/include/global_project/macros/materializations/common/merge.sql

+    {%- set predicates = [] if predicates is none else [] + predicates -%}
+    {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}
+
+    merge into {{ target }} as DBT_INTERNAL_DEST


i could live 1,000 more years and i would still not understand this DML...

I have lived 2 days and I have read the documentation, and I now understand this DML.

The key thing I was missing: when not matched by source is always true here because we're using a constant-false predicate. I was errantly thinking that we were still merging on a unique_key, which we are not.

From the docs:

If the merge_condition is FALSE, the query optimizer avoids using a JOIN. This optimization is referred to as a constant false predicate. A constant false predicate is useful when you perform an atomic DELETE on the target plus an INSERT from a source (DELETE with INSERT is also known as a REPLACE operation).

Cool!

jtcohen6 · 2020-03-03T16:15:23Z

Neat! As part of finalizing my Discourse post about new BQ partitioning + incremental modeling in 0.16.0, I'm going to test this strategy on a (public) dataset of some size.

…dbt into feature/bq-insert-overwrite

drewbanin · 2020-03-04T21:49:30Z

The test failures here appear to be intermittent weirdness on Snowflake's end... Looks like it was returning Arrow data instead of JSON data? Merging this one for 0.16.0!

hui-zheng · 2020-03-15T14:00:50Z

@jtcohen6 @drewbanin
It's exciting to see this feature out. We implemented something similar in-house and had long waited for this dbt native feature!

cla-bot bot added the cla:yes label Feb 24, 2020

drewbanin reviewed Mar 3, 2020

View reviewed changes

jtcohen6 and others added 2 commits March 4, 2020 13:43

Add insert_overwrite as incremental strategy on BQ

e878d0e

Merge branch 'dev/barbara-gittings' of github.com:fishtown-analytics/…

0656477

…dbt into feature/bq-insert-overwrite

drewbanin force-pushed the feature/bq-insert-overwrite branch from 273f0e2 to 0656477 Compare March 4, 2020 18:49

drewbanin assigned beckjake Mar 4, 2020

beckjake approved these changes Mar 4, 2020

View reviewed changes

drewbanin merged commit 3419567 into dev/barbara-gittings Mar 4, 2020

drewbanin deleted the feature/bq-insert-overwrite branch March 4, 2020 21:56

This was referenced Mar 11, 2020

BQ clustering can improve merge performance #2196

Closed

Rework insert_overwrite incremental strategy #2198

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/bq incremental strategy insert_overwrite #2153

Feature/bq incremental strategy insert_overwrite #2153

jtcohen6 commented Feb 24, 2020 •

edited by drewbanin

Loading

drewbanin left a comment

drewbanin Mar 3, 2020

drewbanin Mar 4, 2020

jtcohen6 commented Mar 3, 2020

drewbanin commented Mar 4, 2020

hui-zheng commented Mar 15, 2020

Feature/bq incremental strategy insert_overwrite #2153

Feature/bq incremental strategy insert_overwrite #2153

Conversation

jtcohen6 commented Feb 24, 2020 • edited by drewbanin Loading

Example usage:

drewbanin left a comment

Choose a reason for hiding this comment

drewbanin Mar 3, 2020

Choose a reason for hiding this comment

drewbanin Mar 4, 2020

Choose a reason for hiding this comment

jtcohen6 commented Mar 3, 2020

drewbanin commented Mar 4, 2020

hui-zheng commented Mar 15, 2020

jtcohen6 commented Feb 24, 2020 •

edited by drewbanin

Loading